NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Reproducible and Portable Big Data Analytics in the Cloud

https://doi.org/10.1109/TCC.2023.3245081

Wang, Xin; Guo, Pei; Li, Xingyan; Gangopadhyay, Aryya; Busart, Carl; Freeman, Jade; Wang, Jianwu (January 2023, IEEE Transactions on Cloud Computing)

Cloud computing has become a major approach to help reproduce computational experiments. Yet there are still two main difficulties in reproducing batch based big data analytics (including descriptive and predictive analytics) in the cloud. The first is how to automate end-to-end scalable execution of analytics including distributed environment provisioning, analytics pipeline description, parallel execution, and resource termination. The second is that an application developed for one cloud is difficult to be reproduced in another cloud, a.k.a. vendor lock-in problem. To tackle these problems, we leverage serverless computing and containerization techniques for automated scalable execution and reproducibility, and utilize the adapter design pattern to enable application portability and reproducibility across different clouds. We propose and develop an open-source toolkit that supports 1) fully automated end-to-end execution and reproduction via a single command, 2) automated data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproduction of existing executions in the same environment or a different environment. We did extensive experiments on both AWS and Azure using four big data analytics applications that run on virtual CPU/GPU clusters. The experiments show our toolkit can achieve good execution performance, scalability, and efficient reproducibility for cloud-based big data analytics.
more » « less
Full Text Available
Semantic Network Interpretation

https://doi.org/10.1109/WACVW54805.2022.00046

Guo, Pei; Farrell, Ryan (January 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW))

Network interpretation as an effort to reveal the features learned by a network remains largely visualization-based. In this paper, our goal is to tackle semantic network interpretation at both filter and decision level. For filter-level interpretation, we represent the concepts a filter encodes with a probability distribution of visual attributes. The decision-level interpretation is achieved by textual summarization that generates an explanatory sentence containing clues behind a network’s decision. A Bayesian inference algorithm is proposed to automatically associate filters and network decisions with visual attributes. Human study confirms that the semantic interpretation is a beneficial alternative or complement to visualization methods. We demonstrate the crucial role that semantic network interpretation can play in understanding a network’s failure patterns. More importantly, semantic network interpretation enables a better understanding of the correlation between a model’s performance and its distribution metrics like filter selectivity and concept sparseness.
more » « less
Full Text Available
Scalable and Flexible Two-Phase Ensemble Algorithms for Causality Discovery

https://doi.org/10.1016/j.bdr.2021.100252

Guo, Pei; Huang, Yiyi; Wang, Jianwu (November 2021, Big Data Research)

Full Text Available
Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere

https://doi.org/10.3389/fdata.2021.642182

Huang, Yiyi; Kleindessner, Matthäus; Munishkin, Alexey; Varshney, Debvrat; Guo, Pei; Wang, Jianwu (August 2021, Frontiers in Big Data)
null (Ed.)
The Arctic sea ice has retreated rapidly in the past few decades, which is believed to be driven by various dynamic and thermodynamic processes in the atmosphere. The newly open water resulted from sea ice decline in turn exerts large influence on the atmosphere. Therefore, this study aims to investigate the causality between multiple atmospheric processes and sea ice variations using three distinct data-driven causality approaches that have been proposed recently: Temporal Causality Discovery Framework Non-combinatorial Optimization via Trace Exponential and Augmented lagrangian for Structure learning (NOTEARS) and Directed Acyclic Graph-Graph Neural Networks (DAG-GNN). We apply these three algorithms to 39 years of historical time-series data sets, which include 11 atmospheric variables from ERA-5 reanalysis product and passive microwave satellite retrieved sea ice extent. By comparing the causality graph results of these approaches with what we summarized from the literature, it shows that the static graphs produced by NOTEARS and DAG-GNN are relatively reasonable. The results from NOTEARS indicate that relative humidity and precipitation dominate sea ice changes among all variables, while the results from DAG-GNN suggest that the horizontal and meridional wind are more important for driving sea ice variations. However, both approaches produce some unrealistic cause-effect relationships. Additionally, these three methods cannot well detect the delayed impact of one variable on another in the Arctic. It also turns out that the results are rather sensitive to the choice of hyperparameters of the three methods. As a pioneer study, this work paves the way to disentangle the complex causal relationships in the Earth system, by taking the advantage of cutting-edge Artificial Intelligence technologies.
more » « less
Full Text Available
Large-Scale Causality Discovery Analytics as a Service

Wang, Xin; Guo, Pei; Wang, Jianwu (January 2021, 2021 IEEE Big Data Conference)

Data-driven causality discovery is a common way to understand causal relationships among different components of a system. We study how to achieve scalable data-driven causal- ity discovery on Amazon Web Services (AWS) and Microsoft Azure cloud and propose a causality discovery as a service (CDaaS) framework. With this framework, users can easily re- run previous causality discovery experiments or run causality discovery with different setups (such as new datasets or causality discovery parameters). Our CDaaS leverages Cloud Container Registry service and Virtual Machine service to achieve scal- able causality discovery with different discovery algorithms. We further did extensive experiments and benchmarking of our CDaaS to understand the effects of seven factors (big data engine parameter setting, virtual machine instance number, type, subtype, size, cloud service, cloud provider) and how to best provision cloud resources for our causality discovery service based on certain goals including execution time, budgetary cost and cost-performance ratio. We report our findings from the benchmarking, which can help obtain optimal configurations based on each application’s characteristics. The findings show proper configurations could lead to both faster execution time and less budgetary cost.
more » « less
Full Text Available
Scalable and Hybrid Ensemble-Based Causality Discovery

https://doi.org/10.1109/SMDS49396.2020.00016

Guo, Pei; Ofonedu, Achuna; Wang, Jianwu (October 2020, 2020 IEEE International Conference on Smart Data Services (SMDS))
null (Ed.)
Full Text Available
On the regularization of convolutional kernel tensors in neural networks

https://doi.org/10.1080/03081087.2020.1795058

Guo, Pei-Chang; Ye, Qiang (July 2020, Linear and Multilinear Algebra)

Full Text Available
A Deep Learning Model for Detecting Dust in Earth's Atmosphere from Satellite Remote Sensing Data

https://doi.org/10.1109/SMARTCOMP50058.2020.00045

Hou, Ping; Guo, Pei; Wu, Peng; Wang, Jianwu; Gangopadhyay, Aryya; Zhang, Zhibo (September 2020, 2020 IEEE International Conference on Smart Computing (SMARTCOMP))
null (Ed.)
Full Text Available
Parallel Gradient Boosting based Granger Causality Learning

https://doi.org/10.1109/BigData47090.2019.9005690

Guo, Pei; Liu, Chen; Tang, Yan; Wang, Jianwu (December 2019, 2019 IEEE International Conference on Big Data (Big Data))

Granger causality and its learning algorithms have been widely used in many disciplines to study cause-effect relationship among time series variables. In this paper, we address computing challenges of state-of-art Granger causality learning algorithms, specially when facing increasing dimensionality of available datasets. We study how to leverage gradient boosting meta machine learning techniques to achieve accurate causality discovery and big data parallel techniques for efficient causality discovery from large temporal datasets. We propose two main algorithms for gradient boosting based causality learning, and parallel gradient boosting based causality learning. Our experiments show our proposed algorithms can achieve efficient learning in distributed environments with good learning accuracy.
more » « less
Full Text Available
Aligned to the Object, Not to the Image: A Unified Pose-Aligned Representation for Fine-Grained Recognition

https://doi.org/10.1109/WACV.2019.00204

Guo, Pei; Farrell, Ryan (January 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV))

Dramatic appearance variation due to pose constitutes a great challenge in fine-grained recognition, one which recent methods using attention mechanisms or second-order statistics fail to adequately address. Modern CNNs typically lack an explicit understanding of object pose and are instead confused by entangled pose and appearance. In this paper, we propose a unified object representation built from pose-aligned regions of varied spatial sizes. Rather than representing an object by regions aligned to image axes, the proposed representation characterizes appearance relative to the object's pose using pose-aligned patches whose features are robust to variations in pose, scale and viewing angle. We propose an algorithm that performs pose estimation and forms the unified object representation as the concatenation of pose-aligned region features, which is then fed into a classification network. The proposed algorithm attains state-of-the-art results on two fine-grained datasets, notably 89.2% on the widely-used CUB-200 dataset and 87.9% on the much larger NABirds dataset. Our success relative to competing methods shows the critical importance of disentangling pose and appearance for continued progress in fine-grained recognition.
more » « less
Full Text Available

« Prev Next »

Search for: All records